Estimating Quality in User-Guided Multi-Objective Bandits Optimization

نویسندگان

Audrey Durand

Christian Gagné

چکیده

Many real-world applications are characterized by a number of conflicting performance measures. As optimizing in a multi-objective setting leads to a set of non-dominated solutions, a preference function is required for selecting the solution with the appropriate trade-off between the objectives. This preference function is often unknown, especially when it comes from an expert human user. However, if we could provide the expert user with a proper estimation for each action, she would be able to pick her best choice. The question is: how good do these estimations have to be in order for her choice to remain the same as if she had access to the exact values? In this paper, we introduce the concept of preference radius to characterize the robustness of the preference function and provide guidelines for controlling the quality of estimations in the multi-objective setting. More specifically, we provide a general formulation of multi-objective optimization under the bandits setting and the pure exploration setting with user feedback for articulating the preferences. We show how the preference radius relates to the optimal gap and how it can be used to analyze algorithms in the bandits and pure exploration settings. We finally present experiments in the bandits setting, where we evaluate the impact of noise and delayed expert user feedback, and in the pure exploration setting, where we compare multi-objective Thompson sampling with uniform sampling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Different Network Performance Measures in a Multi-Objective Traffic Assignment Problem

Traffic assignment algorithms are used to determine possible use of paths between origin-destination pairs and predict traffic flow in network links. One of the main deficiencies of ordinary traffic assignment methods is that in most of them one measure (mostly travel time) is usually included in objective function and other effective performance measures in traffic assignment are not considere...

متن کامل

Interactive Thompson Sampling for Multi-objective Multi-armed Bandits

In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solution sets for unknown utility functions of users, based on the stochastic reward vectors only. In online MORL on the other hand, the agent will often be able to elicit preferences from the user, enabling it to learn about the utility function of its user directly. In this paper, we study online MO...

متن کامل

Pareto Local Search for Alternative Clustering

Supervised alternative clusterings is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with tradeoffs. Most approaches in the literature optimize these objectives sequentially (one objecti...

متن کامل

Multi-Objective X -Armed Bandits

Many of the standard optimization algorithms focus on optimizing a single, scalar feedback signal. However, real-life optimization problems often require a simultaneous optimization of more than one objective. In this paper, we propose a multi-objective extension to the standard X -armed bandit problem. As the feedback signal is now vector-valued, the goal of the agent is to sample actions in t...

متن کامل

Bandwidth and Delay Optimization by Integrating of Software Trust Estimator with Multi-User Cloud Resource Competence

Trust Establishment is one of the significant resources to enhance the scalability and reliability of resources in the cloud environment. To establish a novel trust model on SaaS (Software as a Service) cloud resources and to optimize the resource utilization of multiple user requests, an integrated software trust estimator with multi-user resource competence (IST-MRC) optimization mechanism is...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1701.01095 شماره

صفحات -

تاریخ انتشار 2017

Estimating Quality in User-Guided Multi-Objective Bandits Optimization

نویسندگان

چکیده

منابع مشابه

Different Network Performance Measures in a Multi-Objective Traffic Assignment Problem

Interactive Thompson Sampling for Multi-objective Multi-armed Bandits

Pareto Local Search for Alternative Clustering

Multi-Objective X -Armed Bandits

Bandwidth and Delay Optimization by Integrating of Software Trust Estimator with Multi-User Cloud Resource Competence

عنوان ژورنال:

اشتراک گذاری